A Comparison of Tagging Strategies for Statistical Information Extraction

نویسنده

  • Christian Siefkes
چکیده

There are several approaches that model information extraction as a token classification task, using various tagging strategies to combine multiple tokens. We describe the tagging strategies that can be found in the literature and evaluate their relative performances. We also introduce a new strategy, called Begin/After tagging or BIA, and show that it is competitive to the best other strategies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Matrix : a statistical method and software tool for linguistic analysis through corpus comparison

Matrix: A statistical method and software tool for linguistic analysis through corpus comparison A thesis submitted to Lancaster University for the degree of Ph.D. in Computer Science Paul Edward Rayson, B.Sc. September 2002 This thesis reports the development of a new kind of method and tool (Matrix) for advancing the statistical analysis of electronic corpora of linguistic data. First, we des...

متن کامل

Optimal Feature Extraction for Discriminating Raman Spectra of Different Skin Samples using Statistical Methods and Genetic Algorithm

Introduction: Raman spectroscopy, that is a spectroscopic technique based on inelastic scattering of monochromatic light, can provide valuable information about molecular vibrations, so using this technique we can study molecular changes in a sample. Material and Methods: In this research, 153 Raman spectra obtained from normal and dried skin samples. Baseline and electrical noise were eliminat...

متن کامل

Name Tagging Using Lexical, Contextual, and Morphological Information

Abstract This paper presents a probabilistic model for automatically tagging names in a Turkish text. We used four different information sources to model names, and successfully combined them. Our first information source is based on the surface forms of the words. Then we combined the contextual cues with the lexical model, and obtained a significant improvement. After this, we modeled the mor...

متن کامل

Classification Structuring Tagging

This paper presents an information extraction system that processes the textual content of classiied newspaper advertisements in French. The system uses both lexical (words, regular expressions) and contextual information to structure the content of the ads on the basis of predeened thematic forms. The paper rst describes the enhanced tagging mechanism used for extraction. A quantitative evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006